Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Rejection Threshold Estimation for an Unknown Language Model in an OCR Task

Identifieur interne : 000684 ( Main/Exploration ); précédent : 000683; suivant : 000685

Rejection Threshold Estimation for an Unknown Language Model in an OCR Task

Auteurs : Joaquim Arlandis [Espagne] ; Juan-Carlos Perez-Cortes [Espagne] ; Ramon Navarro-Cerdan [Espagne] ; Rafael Llobet [Espagne]

Source :

RBID : ISTEX:9783140E698735B202412CDF7971320FDA579561

Abstract

Abstract: In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the user. In this work, the expected error rate distribution of an unknown language model is estimated from a training set composed of known language models. This means that after building a new language model, the user should be able to automatically “fix” the expected error rate at an acceptable level instead of having to deal with an arbitrary threshold.

Url:
DOI: 10.1007/978-3-642-14980-1_73


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI wicri:istexFullTextTei="biblStruct">
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Rejection Threshold Estimation for an Unknown Language Model in an OCR Task</title>
<author>
<name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
</author>
<author>
<name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
</author>
<author>
<name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
</author>
<author>
<name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">ISTEX</idno>
<idno type="RBID">ISTEX:9783140E698735B202412CDF7971320FDA579561</idno>
<date when="2010" year="2010">2010</date>
<idno type="doi">10.1007/978-3-642-14980-1_73</idno>
<idno type="url">https://api.istex.fr/document/9783140E698735B202412CDF7971320FDA579561/fulltext/pdf</idno>
<idno type="wicri:Area/Istex/Corpus">000108</idno>
<idno type="wicri:Area/Istex/Curation">000106</idno>
<idno type="wicri:Area/Istex/Checkpoint">000264</idno>
<idno type="wicri:doubleKey">0302-9743:2010:Arlandis J:rejection:threshold:estimation</idno>
<idno type="wicri:Area/Main/Merge">000689</idno>
<idno type="wicri:Area/Main/Curation">000684</idno>
<idno type="wicri:Area/Main/Exploration">000684</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title level="a" type="main" xml:lang="en">Rejection Threshold Estimation for an Unknown Language Model in an OCR Task</title>
<author>
<name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author>
<name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author>
<name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
<author>
<name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
<affiliation wicri:level="1">
<country xml:lang="fr">Espagne</country>
<wicri:regionArea>Instituto Tecnológico de Informática, Universitat Politècnica de València, Camí de Vera s/n, 46071, València</wicri:regionArea>
<wicri:noRegion>València</wicri:noRegion>
</affiliation>
<affiliation wicri:level="1">
<country wicri:rule="url">Espagne</country>
</affiliation>
</author>
</analytic>
<monogr></monogr>
<series>
<title level="s">Lecture Notes in Computer Science</title>
<imprint>
<date>2010</date>
</imprint>
<idno type="ISSN">0302-9743</idno>
<idno type="eISSN">1611-3349</idno>
<idno type="ISSN">0302-9743</idno>
</series>
<idno type="istex">9783140E698735B202412CDF7971320FDA579561</idno>
<idno type="DOI">10.1007/978-3-642-14980-1_73</idno>
<idno type="ChapterID">73</idno>
<idno type="ChapterID">Chap73</idno>
</biblStruct>
</sourceDesc>
<seriesStmt>
<idno type="ISSN">0302-9743</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass></textClass>
<langUsage>
<language ident="en">en</language>
</langUsage>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Abstract: In an OCR post-processing task, a language model is used to find the best transformation of the OCR hypothesis into a string compatible with the language. The cost of this transformation is used as a confidence value to reject the strings that are less likely to be correct, and the error rate of the accepted strings should be strictly controlled by the user. In this work, the expected error rate distribution of an unknown language model is estimated from a training set composed of known language models. This means that after building a new language model, the user should be able to automatically “fix” the expected error rate at an acceptable level instead of having to deal with an arbitrary threshold.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Espagne</li>
</country>
</list>
<tree>
<country name="Espagne">
<noRegion>
<name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
</noRegion>
<name sortKey="Arlandis, Joaquim" sort="Arlandis, Joaquim" uniqKey="Arlandis J" first="Joaquim" last="Arlandis">Joaquim Arlandis</name>
<name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
<name sortKey="Llobet, Rafael" sort="Llobet, Rafael" uniqKey="Llobet R" first="Rafael" last="Llobet">Rafael Llobet</name>
<name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
<name sortKey="Navarro Cerdan, Ramon" sort="Navarro Cerdan, Ramon" uniqKey="Navarro Cerdan R" first="Ramon" last="Navarro-Cerdan">Ramon Navarro-Cerdan</name>
<name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
<name sortKey="Perez Cortes, Juan Carlos" sort="Perez Cortes, Juan Carlos" uniqKey="Perez Cortes J" first="Juan-Carlos" last="Perez-Cortes">Juan-Carlos Perez-Cortes</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000684 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000684 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     ISTEX:9783140E698735B202412CDF7971320FDA579561
   |texte=   Rejection Threshold Estimation for an Unknown Language Model in an OCR Task
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024